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two major constraints: data not collected and questions not asked. First, although billed as “final,” 
data on teacher retention and teacher perceptions include only one award cycle, and student 
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caution against drawing unwarranted conclusions beyond those questions. 
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I. Introduction 

District Awards for Teacher Excellence Program: Final ReporP is a program evaluation of the 
latest in a long series of teacher incentive programs in Texas. Conducted by the National Center 
for Performance Incentives (NCPI) at Vanderbilt, the study presents findings describing the 
“experiences and outcomes for Cycle i districts participating in the first two years of the 
program” (p. viii). 

The state-funded program provides grants to districts for the implementation of locally designed 
incentive pay plans; the grants fund merit awards made to teachers on the basis of scores on the 
state achievement test plus other factors determined by individual districts. School districts have 
flexibility in the specifics of how they implement the plan. 

The program distributes $150 million to $197 million annually to 203 participating districts. 

The report, at nearly 500 pages, is correspondingly large. It was released in November, 2010 
(partway through the project’s third year) and is based on data from Years 1 and 2. 



II. Findings and Conclusions of the Report 

The report presents 63 “key findings,” the bulk using descriptive rather than inferential 
statistics. Given the number of districts involved, the amount of data is massive. More than 
100,000 surveys were distributed to educators each spring, for example, and the results 
were subjected to a series of multiple regression analyses using a large number of predictors. 
Similarly, evaluators analyzed individual student achievement data on the Texas Assessment 
of Knowledge and Skill (TAKS) from 1,773 schools. Because “raw scale scores from TAKS 
were not expressed on the same developmental scale from one year to the next or from one 
grade to the next,” the authors constructed a “standardized test score gain” for each student 
(p. 228). 

For this and many other elaborate analytical techniques, an extensive appendix is provided. At 
the same time, it is important to recognize that the available data come from a very short time 
period— either one or two years. The data set is immensely broad without being deep. Statistical 
significance is easily reached with such a large number, but meaningful effect sizes— what is 
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generally called “practical significance” —are a separate question and may not be reached here. 
Findings inferred from an aggregated number of these analyses are often expressed in the report 
in the form of broad generalizations. Some of those key findings are summarized below. 

District Participation 

The 203 participating districts in this voluntary program amount to 16% of all Texas districts. 
But the districts tend to be poor and urban and serve nearly half of all Texas students. Each 
district chose either to include all schools (“district-wide”) or to select particular 
underperforming schools (“select-schools”) to be included. Within some parameters— at least 
60% of funding must go to teacher bonuses, and awards to teachers must be based in part on 
scores from the TAKS— districts have wide latitude in program design and implementation. The 
authors also note the issue of selection bias. Because districts “chose to participate in D.A.T.E., 
and to design their own incentive pay plans, ... if schools that ended up participating in D.A.T.E. 
differed systematically from non-D.A.T.E. schools,” then findings of a difference in student 
performance between the two sets of schools may be due to non-D.A.T.E. factors (p. 83). 

Program Components 

District factors studied include the choice of approach (select-school or district- wide), whether 
all teachers can receive awards, and how non-designated funds are used (larger teacher awards, 
principal awards, professional development, hard to staff positions). School factors include who 
receives awards, award size, and the unit of accountability (individual, team, group, or hybrid). 
These factors are presented in a descriptive fashion and are also used to compare student 
outcomes and teacher awards (as exemplified below). 

Teacher Factors 

Based on Year 1 data only, some teacher factors appeared to matter. Eor example: 

• Teachers with a bachelor’s degree were 12-17% more likely to receive an award than 
those with no degree, but master’s and doctoral degrees did not increase this likelihood 
(p. 80); 

• Teachers new to a school were 12% less likely to earn an award than those who had been 
placed there in earlier years (p. 80); and 

• Teachers with 20 years of experience were 2-4% less likely to receive awards than those 
with 5 years (p. 78). 

Student Outcomes 

The report presents “descriptive differences” between average student passing rates on TAKS 
reading and math tests for D.A.T.E. and non-D.A.T.E. schools. The authors looked at 
constructed individual student gain analyses based on TAKS scores, which they averaged by 
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school in order to compare D.A.T.E. and non-D,A.T.E schools. They could not “provide links of 
teachers to students,” the authors point out, “so it is not possible to identify the most successful 
teachers or to identify the impact of specific teachers on student performance” (p. 83). However, 
the report’s summary of findings from multiple grades and multiple subjects shows a “generally 
positive” result, and some contradictions and perplexities: 

• TAKS passing rates for reading and math were lower at D.A.T.E. schools than non- 
D.A.T.E. schools, but the relationship between D.A.T.E. participation and average 
student achievement gains was “positive, statistically significant, but small in 
magnitude” (p. 84). For example, the difference in TAKS math passing rates between 
D.A.T.E. and non-D.A,T,E schools for grade seven went from more than 9% in 2005-06 
to about 4% in 2009-10— with D.A.T.E. schools below the non-D.A,T.E. schools but 
moving closer, 

• That is, “...D.A.T.E. schools exhibited negative gain scores, but their scores became less 
negative (closer to zero) over time” (p. 91). 

• The unit of accountability (individual or group) “was related to student achievement in 
both reading and math, but not in a consistent direction” (p. 84). 

• “Increasing the maximum award by $1,000 was associated with an increase in TAKS 
math scores of approximately one scale score point (p. 155), but this did not hold true for 
reading. One scale score point is a very small change. 

Educator Opinions 

Each district designed its own incentive plan within parameters, as described above. Teachers in 
D.A.T.E. schools tend to think their D.A.T.E. plans are fair, the goals worthy, and the recipients 
of awards deserving. Significantly, they do not believe the incentive plans contribute much to 
school improvement, but teachers who are positive about their plans tend to be positive about 
other aspects of their schools (p.125). Teachers in schools with group incentives express greater 
satisfaction with their schools and the incentive program than those with individual awards, and 
perceive a more satisfied and collegial workplace (p.154). Yet, individual awards produced 
reports of greater motivation and greater competition, plus higher test score gains and incentive 
payments (p.155). These findings are intriguing, but explanations are not suggested. 

Teacher Turnover 

Teacher turnover for 2008-09 declined in D.A.T.E. districts more than predicted based on 
previous years, but for district-wide plans (-1.3%) the change was “fully attributable to in- 
district turnover.” In select-school plans (-2.2%), the reduction occurred regardless of whether 
the schools were D.A.T.E. schools or not. This result, according to evaluators, “raises the 
possibility that some other policies” in select school districts may have been the cause (p.io8- 
09). Teachers who expected to receive awards tended to receive them, and those who received 
awards, particularly larger ones, were more likely to stay put. 
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Differences between High- and Low-Performing Schools 



The value-added approach to measuring student gains was used to compare expected and actual 
performance on TAKS for D.A.T.E. and non-D,A.T.E schools, and to compare program design 
differences among D.A.T.E. schools. 

High-performing schools were more likely to offer awards to principals (6o%) than low- 
performing (15%), to offer larger awards to teachers, and to use a blended approach to teacher 
awards (group and individual elements). There were “profound differences” in school 
productivity (TAKS reading and math scores) between high- and low-performing D.A,T,E, 
schools, but few differences between schools in either district-wide or select-school districts. The 
authors caution that “This simple descriptive analysis does not establish a causal link” between 
specific D,A.T.E. design features and increased school effectiveness (p. 98). 



III. The Report’s Rationale for Its Findings and Conclusions 

“The report’s objective,” say its authors, “is to inform policymakers and practitioners as they 
consider how to move forward, how to design and implement incentive pay and compensation 
reform for educators, and the implications of those policy choices” (p. 5). 

The timeline of the study may have been dictated by the need to produce information for Texas’ 
next funding cycle. Since the state’s previous plan is described as achieving “dismal results, the 
authors’ conclusion was probably helpful. Based on educator surveys, the “generally positive” 
test score gains on TAKS, and teacher turnover rates 1-2% lower than predicted, they conclude 
that: “...more often than not, participants in the D.A.T.E. program had a positive experience, 
student achievement gains and teacher turnover moved in a generally desirable direction, and 
teacher attitudes were favorable towards D,A.T.E.”(p.xiii). Many of the findings are descriptive 
rather than analytic (such as reasons for district participation), but may be of interest to Texas 
policy-makers. 

As a program evaluation, the report addresses questions of interest to its client rather than 
typical research questions. Still, in serving this purpose and meeting this deadline, the report 
may not meet its larger objective of informing policy-makers outside of Texas, and perhaps not 
even in Texas. The unanswered questions described below are questions that policy-makers 
should be asking. 



IV. The Report’s Use of Research Literature 

NCPI is a major research center on performance pay, and it is not surprising that the authors 
reference historical and assessment literature— some of which they generated. However, they do 
not discuss other recent research that has produced different results. Eor example, the positive 
responses from teachers in Texas regarding the program and the value of bonuses conflicts with 
NCPI’s own well-designed study in Nashville.s Similarly, a much publicized Mathematica study 
in Chicago of the Teacher Advancement Program found no significant results either in teacher 
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retention or student achievement . 4 Both the Nashville and Chicago studies used experimental or 
quasi-experimental designs, allowing for a richer analysis. 

The on-going “ProComp” reform in Denver, arguably one of the most prominent experiments, is 
also not addressed. Denver experienced similarly modest test score gains during its Pay for 
Performance Pilot, for example, and teachers who participated in the pilot supported it. In 
Denver, however, we know why. Teachers liked the program and were happy to receive awards, 
but scorned the motivational value of these awards, which they found insulting. Instead, they 
believed that the realignment of school and district activities in support of teaching and learning 
made the difference.s This is significant, as D.A.T.E., like many such programs, is based on a 
theory of teacher motivation rather than district support. ^ That is, if D.A.T.E. teachers held 
views similar to Denver teachers— they liked the plan but did not believe that bonuses were 
effective change incentives— it would undercut the primary premise of incentive pay. 

The evaluators steer clear of literature questioning the undergirding program foundations, such 
as whether high-stakes use of standardized tests is wise policy .7 This is a particularly relevant 
issue given Texas’ history of standardized testing, which has sometimes been detrimental to 
poor, urban children, who were drilled hard but taught little.^ Though these kinds of references 
may not be common in program evaluations, the vital questions they pose require examination 
before programs such as D.A.T.E. are implemented. 

V. Review of the Report’s Methods 

The report addresses 34 research questions of interest to policy makers in Texas. As noted, its data 
set is very broad. It contains significant gaps, however, and covers a time span that provides an 
inadequate base for such major policy decisions. Primary methodologies include the following: 

• Program design factors based on district proposals are used to understand program 
results, including the differences between high- and low-performing schools. These 
factors, which include the size of the awards, the unit of accountability, and whether 
principals can receive awards, were used to compare teacher results and school 
productivity. Though most districts probably implemented what they proposed, the 
actual degree and fidelity of implementation is unknown. 

• Teacher and administrator opinions are drawn from more than 100,000 annual spring 
surveys from Years 1 and 2. The analysis of these surveys is described in an elaborate 
technical appendix which includes reliability and correlational analyses of clusters from 
personnel surveys, means tables for survey item clusters, tables for hierarchical linear 
modeling, a description of how control schools were selected, copies of the instruments, 
and so forth (p. 124). Still, the results include only one award cycle. 

• A value-added approach is used to compare expected and actual student performance 
between D.A.T.E. and non-D.A.T.E schools, as measured by TAKS pass rates from 2006- 
07 through 2009-10 (two years under D.A.T.E.). The evaluation includes a regression 
analysis “that allows evaluators to condition on many background characteristics” of 
students that may impact achievement (p.89). Student factors include the percentage of 
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white, LEP and gifted and talented students. Teacher factors include years of experience 
and average salary, and school factors include teacher-student ratio and type 
(traditional, charter, etc.) (p.231). As noted above, a value-added score is “constructed 
for each student based on previous years’ scores” (p.228). 

• Teacher turnover data for each district (one year) is compared to projected turnover 
based on six previous years for both D.A.T.E. and non-D.A.T.E. schools. The analysis 
attempts to separate out t5^es of turnover (leaving the school, district, or profession), 
and to control for non-D.A.T.E. factors in determining whether receiving an award and 
the size of the award influences teacher retention. 



VI. Review of the Validity of the Findings and Conclusions 

The report is generally thorough and professional, using extensive statistical techniques where 
possible, but suffers from two significant limitations: data not available and questions not asked. 
These limitations weaken the analysis, and prevent the reader from discovering meaning in its 
findings. The authors regularly insert caveats as to how much can be inferred from their results, 
though they do not mention the limitation of conducting a final study in the second year of a 
three-year project. For example: 

Student Performance 

TAKS passing rates in the two years assessed increased slightly at D.A.T.E. schools, meaning 
that they lost less ground than in previous years to non-D.A.T.E. schools. But the effects were 
small, as noted above, and the timeline was short. While it may be that scores “moved in a 
generally desirable direction,” two years is too short a period to confirm a trend and the impact 
of the program is unclear. The validity of TAKS as a single measure of achievement is not raised. 

Differences Between High- and Low-Performing Districts 

Large differences in school “productivity” (as measured by TAKS reading and math scores) were 
found between high- and low-performing D.A.T.E. schools, suggesting “profound differences” in 
implementing D.A.T.E. (p.99). High-performing districts proposed higher awards for teachers, 
rewards for principals, and hybrid approaches, among other factors. These are important 
factors, but other implementation and school differences may exist that are not explored. Given 
past instances of drilling low-income students on test-taking rather than learning,9 we should 
draw conclusions with care. It’s possible to believe that $3,000-6,000 bonuses may change 
teacher behavior, but the report can’t determine whether higher test scores are due to better 
teaching, more drill, professional development, or something else. 

Teacher Turnover 

Teacher turnover is based on one year of program data. Turnover diminished more in D.A.T.E. 
districts than statewide, as described above, but this decline “was fully attributable” to in-district 
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transfers for district- wide districts. The decline in turnover in select-school districts occurred 
whether the school was a D.A.T.E. school or not, suggesting that “some other policies may have 
changed” in these districts— a non-D.A.T.E factor. 

At the same time, individual teachers who received large awards were more likely to stay put 
than those who did not. The authors proclaim that the turnover rate “surged” among teachers 
not receiving awards, and “fell sharply” among teachers who did, terms that will doubtless be 
repeated in the press (p.109). Many unknown factors cloud these conclusions, however, and it is 
not clear that such dramatic descriptions are appropriate for the small magnitude of the 
changes. 

Educator Opinions 

Surveys were distributed twice, covering one award cycle. They produced many interesting 
findings but results may change after the first year. 

In one example, teachers who expected awards most often report receiving them. They also 
report greater use of professional practices and development— a potentially important finding. 
Some districts plans proposed funding professional development, but we don’t know whether 
the districts or schools providing the professional development are those referenced by teachers, 
as occurred in Denver, or whether teachers initiated their own professional development. This is 
unfortunate. The latter practice could indicate the successful use of bonuses to motivate teachers 
to improve— a significant result, quite different from other studies. “ 

Teachers who were positive about the program were also positive about their schools generally, 
to cite another example. But it’s possible that school satisfaction leads to program satisfaction, 
rather than the reverse. Indeed, evidence from Denver suggests that teachers rarely think in 
their daily work about possible bonuses in June, but are affected by the school culture and 
climate every day.^^ It could be that teachers simply like their schools, with or without D.A.T.E, a 
different conclusion from the report. In sum, despite the report’s significant breadth, it lacks the 
data to explore its findings in greater depth. 



VII. Usefulness of the Report for Guidance of Policy and Practice 

The study presents many intriguing findings about teacher attitudes, the potential impact of 
group versus individual awards, and other program factors. But the evaluation’s design 
limitations undercut its broader use. We learn that teachers support the program, that it is 
associated with a slight improvement in test scores and a decline in teacher turnover, and that 
schools where teachers and principals received larger awards posted slightly higher test scores. 
But these generally positive findings are too small, too confounded, come in too short a time 
span and leave too many unexplored questions. Despite this wealth of detail, the reader is not 
much wiser regarding the issues and impact of performance pay by the end. 

Based on their responsible caveats and generally modest conclusions, the authors acknowledge 
the study’s limitations. They sampled widely but not deeply, conducted thorough analyses of the 
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data they collected, and raised cautions as to specific conclusions. If their study is used to 
summarize a massive state experiment as a prelude to a more in-depth study and analysis, it 
adds to the conversation. But if it is touted as proof that performance pay can work, it is being 
misused. It is far too early in the D.A.T.E. program, and the gaps in data are far too great, to rely 
on this report to support any conclusion that performance pay “works.” Unlike the reports 
referenced above from Nashville, Chicago and Denver, this study does not delve deeply enough 
to address this causal question. It describes the details of D.A.T.E. but cannot yet demonstrate 
that the concept of incentivizing teachers has positive or lasting results. Its findings are 
interesting, but not sufficient for guiding policy. 

One final point may be the most significant: the study does not question the underlying 
premises of performance awards for teachers: Is TAKS a legitimate measure of student 
achievement? Can test scores alone define good teaching, or might D.A.T.E. be encouraging 
unintended consequences and undesirable ends such as narrowing the curriculum, emphasizing 
test-taking over thinking, encouraging student passivity, or ignoring students’ social and 
emotional growth? Since D.A.T.E. districts serve primarily low-income, urban students, such a 
result might increase the knowledge gap even as the test -based “achievement” gap decreases. An 
evaluation that considered these basic questions of program impact would provide important 
guidance for policymakers. 
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